-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Deleting sample when sampling configuration is deleted or changed #136123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deleting sample when sampling configuration is deleted or changed #136123
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds logic to handle cleanup of cached samples when sampling configurations are deleted or modified in the SamplingService's clusterChanged method.
- Implements cleanup logic in SamplingService.clusterChanged() to remove samples when their configurations are deleted or changed
- Adds comprehensive test coverage for various configuration change scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
SamplingService.java | Implements the core cleanup logic in clusterChanged method to remove samples when configurations are deleted or modified |
SamplingServiceTests.java | Adds comprehensive test case covering project deletion, metadata removal, configuration changes, and unchanged configuration scenarios |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
server/src/main/java/org/elasticsearch/ingest/SamplingService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/ingest/SamplingService.java
Outdated
Show resolved
Hide resolved
).collect(Collectors.toSet()); | ||
for (ProjectId projectId : allProjectIds) { | ||
if (event.customMetadataChanged(projectId, SamplingMetadata.TYPE)) { | ||
SamplingMetadata oldSamplingConfig = event.previousState().metadata().hasProject(projectId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename this to oldSamplingMetadata and newSamplingMetadata? Otherwise it's easy to confuse this with the actual sampling configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some drive-by comments, I hope you don't mind :)
Set<ProjectId> allProjectIds = Stream.concat( | ||
event.state().metadata().projects().values().stream().map(ProjectMetadata::id), | ||
event.previousState().metadata().projects().values().stream().map(ProjectMetadata::id) | ||
).collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just do this, right?
Set<ProjectId> allProjectIds = Stream.concat( | |
event.state().metadata().projects().values().stream().map(ProjectMetadata::id), | |
event.previousState().metadata().projects().values().stream().map(ProjectMetadata::id) | |
).collect(Collectors.toSet()); | |
Set<ProjectId> allProjectIds = Sets.union( | |
event.previousState().metadata().projects().keySet(), | |
event.state().metadata().projects().keySet() | |
); |
I'd personally be inclined to optimize that a bit more, because 99% of the time those two keysets will be equal, but that is only relevant when we have lots of projects, so I'm fine with labeling that as a premature optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha yeah you're right. I'll change that. I forgot that there was a metadata().projects()
(even though I'm using it)!
SamplingMetadata oldSamplingConfig = event.previousState().metadata().hasProject(projectId) | ||
? event.previousState().projectState(projectId).metadata().custom(SamplingMetadata.TYPE) | ||
: null; | ||
SamplingMetadata newSamplingConfig = event.state().metadata().hasProject(projectId) | ||
? event.state().projectState(projectId).metadata().custom(SamplingMetadata.TYPE) | ||
: null; | ||
Map<String, SamplingConfiguration> newSampleConfigsMap = newSamplingConfig == null | ||
? Map.of() | ||
: newSamplingConfig.getIndexToSamplingConfigMap(); | ||
Set<String> currentlyConfiguredIndexNames = newSampleConfigsMap.keySet(); | ||
Set<String> previouslyConfiguredIndexNames = oldSamplingConfig == null | ||
? Set.of() | ||
: oldSamplingConfig.getIndexToSamplingConfigMap().keySet(); | ||
Set<String> removedIndexNames = new HashSet<>(previouslyConfiguredIndexNames); | ||
removedIndexNames.removeAll(currentlyConfiguredIndexNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of an optional code styling suggestion, so feel free to ignore if you prefer your current implementation.
SamplingMetadata oldSamplingConfig = event.previousState().metadata().hasProject(projectId) | |
? event.previousState().projectState(projectId).metadata().custom(SamplingMetadata.TYPE) | |
: null; | |
SamplingMetadata newSamplingConfig = event.state().metadata().hasProject(projectId) | |
? event.state().projectState(projectId).metadata().custom(SamplingMetadata.TYPE) | |
: null; | |
Map<String, SamplingConfiguration> newSampleConfigsMap = newSamplingConfig == null | |
? Map.of() | |
: newSamplingConfig.getIndexToSamplingConfigMap(); | |
Set<String> currentlyConfiguredIndexNames = newSampleConfigsMap.keySet(); | |
Set<String> previouslyConfiguredIndexNames = oldSamplingConfig == null | |
? Set.of() | |
: oldSamplingConfig.getIndexToSamplingConfigMap().keySet(); | |
Set<String> removedIndexNames = new HashSet<>(previouslyConfiguredIndexNames); | |
removedIndexNames.removeAll(currentlyConfiguredIndexNames); | |
Map<String, SamplingConfiguration> oldSampleConfigsMap = Optional.ofNullable(event.previousState().metadata().getProject(projectId)) | |
.map(p -> p.custom(SamplingMetadata.TYPE)) | |
.map(SamplingMetadata::getIndexToSamplingConfigMap) | |
.orElse(Map.of()); | |
Map<String, SamplingConfiguration> newSampleConfigsMap = Optional.ofNullable(event.state().metadata().getProject(projectId)) | |
.map(p -> p.custom(SamplingMetadata.TYPE)) | |
.map(SamplingMetadata::getIndexToSamplingConfigMap) | |
.orElse(Map.of()); | |
Set<String> removedIndexNames = new HashSet<>(oldSampleConfigsMap.keySet()); | |
removedIndexNames.removeAll(newSampleConfigsMap.keySet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that too. Unfortunately though getProject() throws an exception rather than returning null if the project doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh but using projects().get(projectId) works fine. I'll switch to that.
event.previousState().metadata().projects().values().stream().map(ProjectMetadata::id) | ||
).collect(Collectors.toSet()); | ||
for (ProjectId projectId : allProjectIds) { | ||
if (event.customMetadataChanged(projectId, SamplingMetadata.TYPE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're retrieving the SamplingMetadata
from both projects below, it doesn't really make sense to do this customMetadataChanged
here, as that gets the SamplingMetadata
from both projects as well. We might as well just get the two customs below and check their equality here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't really save us anything other than a two hash map lookups will it? And it's possible that someone could optimize customMetadataChanged in the future and then we'd miss out on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd save the project lookup and the customs lookup, both twice. I agree that that's not much. I'll leave it up to you 👍
This adds logic to SamplingService's clusterChanged method so a samples are removed from memory whenever the sampling configuration for that sample is deleted or modified.